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^ ' Abstract 



We introduce a time-parallel algorithm for solving numerically almost 
integrable Hamiltonian systems in action-angle coordinates. This algo- 
rithm is a refinement of that introduced by Saha, Stadel and Tremaine in 
1997 (SST97) for the same type of problems. Our refined algorithm has a 
, better convergence obtained from the use of derivatives of the perturbing 

term not considered in the original SST97 algorithm. An advantage of 
this algorithm is its independence of the step-size for the parallelized pro- 
, cedures which can be consider as a particular case of the parareal scheme. 

1 Introduction 

> , 

■"sj" , Many authors agree that the first suggestion of some time-parallel solution for 

scalar ordinary differential equation was proposed by Nievergelt in 1964 and 
led to the so called multiple-shooting methods 26 . A few years latter (1967), 
Miranker and Liniger proposed a family of parallel Runge Kutta methods for 
small scale parallelism, based on the predictor-corrector method [5S]. In 1982, 
Lelarasmee, Ruehli and Sangiovanni-Vicentelli introduced the waveform relax- 
ation methods (WR) in [3T]. WR is based on the decomposition of a complex 
system of mixed implicit equations into a system of single implicit equations. 
After decomposition, each implicit equation can be solved independently of 
■ the others and consequently the system can be solved in parallel in a natural 

^ I way. The first implementation of some time-parallel algorithm for ODE systems 

which takes advantage of those three methods is given by Bellen and Zennaro 
inia. 

In the context of almost integrable Hamiltonian systems, Saha, Stadel and 
Tremaine introduced, in 1997, a parallel method for the computation of orbits 
for the Solar System dynamics }29) . In part, their work is the continuation of 
other papers published in 1992 et 1994 about symplectic integrators and long- 
term planetary integration [271 [2E]- Simultaneously to Saha et ai, Fukushima 
introduced an alternative method for to obtain numerically a global solution 
of ODE systems [TTl [T^l [13] ■ His method consists in to use the Picard itera- 
tion method to approximate iteratively a global solution. Such solutions will be 
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expressed in terms of Chebyshev polynomials to accelerate numerical computa- 
tions. Between 1998 and 2000, Erhel and Rault worked on a parallel algorithm 
applied to the computation of satellite's trajectories [5]. In the same approach 
as Bellen and Zennaro, they implement a multiple shooting technique but in- 
stead of the WR method they use the fixed point theory and Newton iterations. 
They use automatic differentiation \17\ in order to save time when computing 

the Jacobians Jp fu'f'' ) for the Newton iterations. 



The parareal algorithm was introduced by Lions, Maday and Turicini in [22] 
and modified by Bal and Maday in [S] . It had received with interest some years 
latter [U |3j IH [T6j [T5j |30] . This algorithm is based on a coarse-discretization 
predictor (solved sequentially) and a fine-discretization corrector (solved in par- 
allel). In the very beginning the pararreal algorithm was used to accelerate 
numerical solutions for parabolic and elliptic systems of PDEs. However, for 
hyperbolic systems and highly oscillatory problems, the coarse solver cannot 
predict the fine solution in a satisfactory way. Several attempts has been tested 
in refinning the coarse solver [3j [71 [6j [TOl [1] and the more recent refinement is 
a symmetric scheme with projection due to Dai, Le Bris, Legoll and Maday [5]. 
This symmetric algorithm was tested for solving almost integrable Hamiltonian 
systems with good results. However, for long-term computations of highly os- 
cillatory systems we must to reduce the local error to the size err ^ 10~^^, 
it means, to the £-machine, and this is achieved for several (more than 10) 
iterations of the parareal scheme. 

In this paper we propose a refinement of the SST97 algorithm looking for to 
accelerate the solution and to preserve the accuracy of the sequential integrator. 
In fact, our algorithm converges 100% to the sequential underlying integrator 
althrought the cost of the corrector step is high. Our current research is about 
the economy of the corrector step and the first results are documented in [IB] . 

2 The time- parallel algorithm 

The method we introduce in this paper is concerned with numerical solutions of 
almost integrable Hamiltonian systems. Although the exposition will be done 
for Hamiltonian vector fields with Hamiltonian perturbing part, at this point 
we do not know about any restriction to apply this method to vector fields with 
non- Hamiltonian perturbation. 

This method is an extension of the Saha, Stadel and Trcmaine algorithm 
[29] and it is based in the multi-shooting and Picard's iterative methods, as 
well as the theory of almost integrable Hamiltonian systems. We start with the 
Picard's iterative method for solving differential systems. 

2.1 Picard's iterative method 

Let's consider the initial value problem (IVP) 




y(t) = /(t,y(t)), 



2/(0) = yo- 



(1) 



where y : [0, T] R™ and / : [0, T] x R" -> 
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Integrating both sides of ((T|) from zero to i G [0,T] we obtain the system of 
differential equations written in its integral form 

y{t)=yiO)+ f fis,yis))ds, te[0,T]. (2) 
Jo 

Applying Picard's iterative method, we can approximate the solution for every 
t€[0,T] by 

y''+\t)=y^+ f f{s,y\s))ds, t&[Q,T]. (3) 
Jo 

If ?/o belongs to some convex domain 2? C M'" around the solution and if 
F{t,y{t)) is Lipchitz in V x [0,r], Picard's theorem assures the convergence 
of its iterative method. An iteration of the Picard's method approximates the 
solution in space and we say that it is a vertical iteration. 

In order to parallelize the numerical computations, we discretize the problem 
by partitioning [0, T] in N smaU slices of size At — T/N. We set tg = 0, t„ = T 
and ti = iAt such that 



^ to < ti < t2 < ■ ■ ■ < t, < ■ ■ ■ < tN = T. (4) 

Then the i-th slice is Ati — [ti,ti+i] for j = 0, • • • , — 1. In the same way we 
write yn = y{tn) for the value of the solution at time t„ and for the approxima- 
tions of the Picard's method we will use the superscript = y^(tn)- 

Using this discretization we use the linearity of the integral to rewrite Q 
like a sum of integrals in the form 

y'+'iT) = yo + Y. f{s,yHs))ds, se[t,^i,t,]. (5) 

The solution at time i„ is approximated by the (fc + l)-th iteration denoted 
by y^+i = y'^^^itn) with partial sums 

n ■ 

/+'(in) = yo+Y^I f{s,y''{s))ds, 1 < n < A^. (6) 

Developping the sums we obtain an iterative scheme for the time by 

y'-'Hto) = y'oi-yo) (7) 

= y''+\t^)+ J^"^' f{s,y''{s))ds, n = 0,...,N-l.{8) 

We call the iterations in time horizontal iterations. The reader must note that, 
until now, all computations are exacts. The discretization just give us the way 
to combine vertical and horizontal iterations. 
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2.2 Perturbed systems 

Since we are interested in almost integrable Hamiltonian systems, we will use 
perturbed IVP in the form 

f{y{t)) + e9{t,yit)), j/(o)=j/o, (9) 

where g :[0,T]x — > is the perturbing function with perturbing param- 
eter e ^ 1. Its integral expression is 

yit)=yiO)+ f fiyis))ds + e f gis,yis))ds, t € [0,T], (10) 
Jo Jo 

and the bi-iterative scheme is 

y'^'ito) = y'oi-yo) (11) 



y 



'+'(Wi) = y'+\t,,)+ f{s,y''{s))+eg{s,y'^{s))ds, (12) 



for n = 0,...,A'^ — 1. This bi- iterative scheme permit us to compute the definite 
integral from the preceding iteration k and to make a correction with the values 
of the new iteration A; + 1. It means that the integrals can be computed in 
parallel with some numerical scheme and a small stepsize 6t — At/N^^ and 
perform the corrections in sequence, just as in the parareal scheme. However, 
applying this bi-iterative scheme directly will introduce a lot of errors to mid 
and long term computations and it converges very slowly. Seen as a predictor- 
corrector scheme, it corresponds to the identity map as the predictor, which 
produce a so far approximation. 

One way to boost the convergence of the method is to consider the problem 
of solving ([T^ as a system of algebraic equations. We denote the integrals by 

Fiy')= r f{s,y\s))ds 
G{y')= r 9{s,y\s))ds 

and we consider a„ := y'^~^^{tn) as a fixed parameter (it corresponds to the 
initial condition to the sub-problem when t 6 [tn,tn+i])- Then, we want to 
solve 

=an + F(2/'=) + eG(z/), 

as the fixed point problem 

y = a„ + F(j/) + eG(y), y e M". (13) 

Let y* e R™ be a solution for p^ . then for every y G M'" the following 
identity holds 

y* = an + F{y*) + eG{y) + e {G{y*) - G{y)) . 
It implies that the iteration 

= a„ + F(/+i) + eG(/) + e (G(/+i) - G(/)) , (14) 



4 



will converges to the solution if a„ is close to y* . Returning to the integral form 
for F and G we obtain the iteration 



VnXl = yn+' + I' f{s, y''+')ds + e f gis, y')ds + cR (15) 
where = y''(t„), a = t„, & = i„+i and 

R^( [ g{s,y^+')ds- I g{s,y')ds\, 



is a remainder which goes to zero when y'^ -+ y''^^. 

Remark 1 In '291, the authors considered 

e (G(/+i) - G(/-)) ^ eDG{y'+') ■ {y^+' - /), 

as a second order term and their bi-iterative scheme is just 

2/'=+i=a„+F(y^-+i) + eG(/) (16) 

(expression (9) in f29f ). 

The rule (I14p define an implicit difference equation which can be solved by 
some iterative scheme for algebraic equations, for example the WR algorithm 
[2]. However, for the case of almost integrable Hamiltonian systems, there are 
explicit (symplectic) algorithms for which the numerical computation of F(ij^~^^) 
has no additional cost. 

2.3 Almost integrable Hamiltonian systems 

An almost integrable Hamiltonian system in generic canonical variable^ has a 
Hamiltonian function in the form 

H{t,p,q) = Hint{p,q)+eHper{t,p,q), p,qeW,teR, (17) 

where 

• Hint {p, q) is the integrable term which is independent of the time, 

• Hper{t,PTq) is the perturbing term which can depends on p, q and the 
time t as well, 

• e is the perturbing parameter and it gives the size of the perturbation, 
and 

• r is the number of degrees of freedom (DoF) of the system. 

If we define y = (p, q) G M^'' we can write the Hamiltonian vector field for 
the almost integrable problem as 

y{t)^ JVyHjnt{y{t))+eJVyHper{t,y{t)), yeR^^, telcR, (18) 



^Not necessarily action- angle variables 
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where 

) ' I,OeMr{R) (19) 

is the canonical symplectic matrix in Af2r(IR) and Vj, means the gradient with 
respect to y. 

From (fTS|) we see that 

f{y{t)) ^ JVyHiMt)) and g{t,y{t)) ^ J\7yHp,r{t,y{t)). (20) 

Now we return to expression ([TS]) and we spht it in p and q to obtain two 
iterative rules 



PnXi = Prr+ hiy'^ds + e g,is,y')ds + eR, (21) 



J a J a 

where a ^ tn, b ^ t„+i, / = (/i,/2), g = {91,92), R = {Ri,R2) and 



(22) 



R^^yj 9^{s,y''+lds- j gd.s,y')d.sj , i ^ 1,2. 

The most important fact obtained from these expressions is that perturbing 
terms depend only on the values obtained in the last iteration. This implies 
that we can compute them in parallel and attach those values in the sequential 
correction. 

In the rest of this work we will be interested in almost integrable systems in 
action-angle coordinates with separable perturbing function in the form 

H{t,p,q) - HiM+<HPp^rit^q)+H],^^{t,p)), 

in order to use symplectic integrators. With this restriction, (|20p becomes 

m^^^ip), and git,q,p)=(-^^iLq),^^it,p)). (23) 



dp \ dq ' dp 

Additionally, we must separate the remainders to be computed in an indepen- 
dent step. The iterative scheme reduces to 

^^'t! = p't'-^[^-^is,q')ds 



with the same notation as above. 

The two first steps correspond to the SST97 algorithm if we use the symplec- 
tic implicit integrator of middle point. We extend the SST97 algorithm using 
explicit symplectic integrators of higher order and the second order term ei? 
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We take expression (1^ and we separe g{t, q,p) as 

gp{t,q)^^^{t,q) and g^{t,p) ^ ^^^{t,p), 

where the subscript p (respectively q) means that gp (resp. gq) is the perturbing 
term of the p (resp. q) equation. 

Our "toy" parahel leapfrog scheme is 

= Pt'-jegpiq^.) 

IXl = 9r' + '5t(/,(p,';t|) + ^5.(^'«+i)) 

Pn+1 = Pn+1 - 1^^9p{q„+l) 

VnXl = y'+f +Ate(g(2/':f)-5(2/';n): 

which is an exphcit scheme. In this case, 5t ~ At The reader must note that 
this scheme is more expensive that the original sequential scheme. In order to 
have a gain on the algorithm we must concatenate several leapfrog schemes or 
other explicit symmetric symplectic integrators. 

In fact, we can see this procedure like a parallel composition method for sym- 
metric symplectic integrators, equivalent to those used by Suzuki |31) . Yoshida 
[5^ . McLachlan [21 [H] and Laskar-Robutel P^ . 

For simplicity, we can put together j steps of the leapfrog scheme to get 

pt^ - ^^gpiin) 

^'^^'+St{fq{pll\^)+egq{pl^^)) 
lC^~Stegp{qt^^ 

ytXi + (.9 (ytXi) - .9 (yn+i)) ■ 

In this case we have At = j ■ St meanning that the second order term will be 
computed once every j ^t-steps. More complex symplectic integrators can be 
used instead of the leapfrog scheme like the SAB An or SBABn integrators from 
Laskar and Robutel [T5] . 

3 The algorithm 

The original algorithm was designed by Saha et al. and introduced in [25] to 
compute planetary orbits for Solar System dynamics. The integrable part of the 
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Hamiltonian function corresponds to the Keplerian orbits of the planets around 
the Sun (in elhptic coordinates) and the perturbing term corresponds to the 
interplanetary forces (in Jacobian ccordinates). 

Our algorithm uses a high-order multistep integrator where each local per- 
turbing value must be saved and passed to the master process. In order to avoid 
"bottle-necks" in the communication it is recommended to use this algorithm 
in a shared memory environment. 



Converged 




Figure 1: A shifting parallelization window computes P intervals of size Ai in 
parallel. Every interval /St is decomposed in smaller intervals of size 5t 

We select an explicit symmetric symplectic integrator _F, for instance, the 
leapfrog or any of the SAB An or SBABn integrators 19 . We concatenate j > 1 
schemes to be compute in parallel by each thread. We define 5t to be the time 
step of a single F scheme and At = j ■ St the interval computed in parallel. 

Finally, we open a shifting parallelization window which contains P intervals 
of size At. P is the number of threads or, if each thread runs on a single 
processor, P becomes the number of processors in the parallel machine. After 
each iteration, we verify how many At intervals have converged and we shift the 
window until the first non converged At interval (see Figure [1]). 

We denote by 

^(viili) - (a) 

the z-th numerical value for the integrable part and by 

the corresponding z-th value for the perturbing part of the Hamiltonian function. 
The coefhcients Ci and di, i = are selected based on the symplectic 

underlying integrator. 

We show the general structure of the algorithm in the following frame 
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Algorithm 1. Jimenez-Laskar. 



Setup of the initial guess sequence 
= 2/(0), y°+i= 5(2/0) 
G° =0 
While r < iV do 

Parallel resolution on for r < n < r + P: 

For i=l to j 

compute (yl^l±^ + (^2/1+ : 

save © (^nl -' 

End (For i), 
(fc) 

Ur+l = Vr+l 

conv = 1. 

Converged=TRUE. 
Head^TRUE. 
Sequential corrections: 

For n = (r + 1) to (r + P) 
For i=l to j 

compute yl^^) = (^2/1^^) + e@ (^y^^ 

End (For i) 

compute Gi'^,^^ - {vi'+i^^) 

If Head=TRUE 

If Converged=TRUE 

(fe+i) 

Un+l = Vn+i 

conv — conv + 1 . 
Else 

r = r + conv (shifts the window). 

Converged^FALSE 

Head=FALSE 

begins the next parallel iteration 
with the available processors. 
End (If Converged=TRUE) 
End (If Head^TRUE) 
End (For n). 

End (While). 



where the sequence {yli^} is the k-ih approximation to the solution {u„}. It 
is important to note that this algorithm converges exactly to the sequential 
underlying integrator. 



4 Numerical examples 

We have tested our algorithm with the simple Hamiltonian pendulum and the 
Spin-orbit problem. We have implemented the code in TRIP [T4' simulating 
the parallel implementation for several values of P in order to find P* which 
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optimizes the speed-up function Sp. The discussion about this subject will be 
treated in the next section. 

Hamiltonian problems for very long times. In order to reach this, we have 
tested the algorithm with no tolerance in the error between the sequential so- 
lution and the parallel aproximation. It means that we test err==0 at every 
iteration where err=\\yseq{tn) — y^'^H'^n)]] is the numerical Euclidian distance 
between the sequential solution yseqitn) and the fc-th approximation y^''\tn) at 
time tn- Of course, if we relax this condition to the machine tolerance or to 
the size of the Hamiltonian error at each At interval we will have a gain in the 
speed-up function. 



flSI I 
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Iteration k = 10 




Iteration k = 10 


Iteration k = £0 




Iteration k = 20 


Iteration k = 30 




Iteration k = 30 






Iteration k = 4-0 






Iteration k = 50 







Iteration k = GO — , 






Iteratiori k = 70 






Iteration k = SO 




100 200 300 4O0 5O0 600 700 800 900 1000 
Dt 



100 2O0 300 400 500 600 70O BOO 900 lOOO 
Dt 



Figure 2: The Hamiltonian pendulum. Square of the Euclidean distance be- 
tween the sequential and the parallel solutions in logj^Q scale. Each curve cor- 
responds to the fc-th iteration for k — 10, 20, .... The ASD parallel algorithm 
takes 36 iterations meanwhile the Parareal algorithm takes 109 iterations to 
total convergence {err = 0). 



4.1 The simple Hamiltonian pendulum 

The first example is the simple Hamiltonian pendulum with equation 

H{p,q) = ^p^~ecosq (24) 

which we integrate with a SBAB4 symplectic integrator of order 0{T^e + r^e^) 
[To] . We have made several test for different values of e, St, At (in St steps), the 
size of the simulation N (in St steps) and the size of the shifting parallelization 
window P (At intervals in parallel). 
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Notation 1 We have de following notation: 



• Ca* will denote the mean number of At intervals converged to the sequen- 
tial solution at each iteration. 

• I(t,p) "^^^^ denote the mean number of iterations needed to make converge 
P intervals of size At. 

• ^(At.P) 'will denote the total number of iterations needed to obtain the 
sequential solution. 

These values are related by the following relationships 

N N 

P = CAt X /(At.p) and fc(At,P) = — /(At,p) = -7;— (25) 

jP ' jCAt 

where j e N is such that At — jSt 

Table 1 shows values obtained for a simple test with parameters e = 0.01, 
5t ^ 0.01, At = 1, N = 10^ and the initial conditions {po,qo) = (1,0). In 
this case, we concatenate 100 St steps in a At interval and the number P of At 
intervals computed in parallel varies from 50 to 500 in steps of 50 intervals. 



p 


Ca* 


I(At,P) 


k(At,P) 


50 


6.9728 


7.17072 


1434 


100 


12.018 


8.32083 


832 


150 


16.3918 


9.15092 


610 


200 


20.5318 


9.74097 


487 


250 


24.3285 


10.276 


411 


300 


27.6981 


10.8311 


361 


350 


30.6718 


11.4111 


326 


400 


33.7804 


11.8412 


296 


450 


36.36 


12.3762 


275 


500 


38.9066 


12.8513 


257 



Table 1. Values obtained for different sizes of the shifting 
parallelization window P. The values for the parameters are 
e 0.01, 5t = 0.01, At = 1, iV = 10^ and T = 10^. 

We have made several tests for different values in the parameters and we 
noted that Cai, I(At.P) and fc(At,p) have, in all cases, the same qualitative 
behavior. In particular, /(At,p) has an almost linear behavior then we have 
fitted a straight line I{P) ^ I(At,p) by least squares technique. Expressions for 
Cai, I(At,p) and fc(At,p) using are: 

Cai = . , ^ fcx , -^(At,p) = 3{aP + b) and fc(At,p) = ( a + 4 ) ^- (26) 
j(a+pj \ PJ 

Figure [3] shows the data of Table 1 with the fitted curves for the Hamiltonian 
pendulum with parameters a = (2.51)~* and h — (7.64)^^. 
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Figure 3: Plots of the data from Table 1 for Ca*, I{At,p)j and k^At.p) and its 
fitting curves with parameters a ~ (2.51)^"* and b = (7.64)^^. 

4.2 The spin-orbit problem 

The next example is the spin-orbit problem with equation 

H{p,q) = - ecos2g - a(cos(29 + 0) - 7cos(2g- (/))) (27) 

which we integrate with the same symplectic integrator as in the pendulum 
case. We have used the parameters a = 10~^ and (j) = 0.2 in addition to the 
values of the last example. Table 2 shows values obtained for a simple test of 
the spin-orbit problem. 



p 


Ca* 


I(At,P) 


k(At,p) 


50 


6.08952 


8.21082 


1642 


100 


9.81256 


10.191 


1019 


150 


12.8028 


11.7162 


781 


200 


15.3594 


13.0213 


651 


250 


17.6039 


14.2014 


568 


300 


19.5293 


15.3615 


512 


350 


21.2745 


16.4516 


470 


400 


22.9335 


17.4417 


436 


450 


23.9211 


18.8119 


418 


500 


24.8731 


20.102 


402 



Table 2. Spin-orbit problem: values obtained for different sizes of the 
shifting parallelization window P. The values for the parameters are 
e = 0.01, a = lO"-*, 4) = 0.2, 5t = 0.01, At = 1, iV = lO*' and T = 10"*. 

Figure W?^ shows the data of Table 2 with the fitted curves for the spin-orbit 
problem with parameters a = (1.19)"'* and b = (7.11)~^. 
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20 
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30 
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50 
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Figure 4: The spin-orbit problem. Square of the Euchdean distance between 
the sequential and the parallel solutions in logj^p scale. Each curve corresponds 
to the k-th iteration for k = 10, 20, .... The ASD parallel algorithm takes 
54 iterations meanwhile the Parareal algorithm takes 135 iterations to total 
convergence {err — 0). 



4.3 Speed-up of the parallel algorithm 

In this subsection we find an expression for the speed-up of the algorithm with 
parallelization window. 

Definition 1 We call speed-up, denoted by Sp{N), to the ratio of the time 
Ti(iV) required to solve a given problem using the best known serial method to 
the time Tp{N) required to solve the same problem by a parallel algorithm using 
P processors 

where N denotes the size of the problem. 

We use an alternative notation inverting the parameter P and the variable 
N since we are interested in finding some expression for the speed-up Sn (P) in 
terms of P. The classical notation is Sp{N). 

Let T-p be the predictor's computing time for a At interval and let Tc be the 
corrector's computing time for the same interval. We suppose we have a first 
estimation of the solution. Then, the speed-up function for the parallel SABAn 
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Figure 5: Plots of the data from Table 2 for the spin-orbit problem. We show 
Caii I(At,p)7 and k^j^i p-f and its fitting curves with parameters a = (1.19)^'' 
and &= (7.11)^2 



with shifting parallelization window P is given by 



SNiP) 



k(At,p) [Tv + PTc) ' 



where k(^^f p-^ e N is the number of iterations to converge. This is the worst 
case, since with our algorithm we do not wait for ending the P corrections. In 
fact, a more accurate expression is obtained substituting P by {P — Cai)- Since, 
it is desirable to have an expression in terms of P only we use equations (1261) to 
obtain such expression. 

In what follows, we assume kj^^^ pj — k{P) has an analytic expression equiv- 
alent to (pS)) as in the cases for the Hamiltonian pendule, the spin-orbit problem 
and the planetary problem (see |18jV After some simplifications and considering 
P — Cai instead of P, the formula for the speed-up becomes 

where 

+ and 
\Tc a jaj aTc 

Expression (pS)) does not depend on the size of the problem N (see Figure E])- 

Since we have a quadratic polynomial in the denominator, we may have 
vertical asymptotes in its roots. We impose the condition S{P) > for P > 
which implies P"^ + BP + C > 0. Additionaly, we have that 5(0) = 0, S{P) > 
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Speed-Up 




Figure 6: The Speed-up function with parameters a — (1.19) ^, 6 = (7.11) ^, 
j = 100 and ^ = 10. 

if P > and since limp_i>oo S{P) = then there exists a critical point P* which 
maximize S{P). We have the following 



Proposition 1 // the function k{P) has the form !i26\) . then S{P) has a max- 
imum in P* which optimize the parallel algorithm. Moreover, the theoretic 
optimal speed-up is 



S{P*) 



1 



2a. 



Tc 



(29) 



Proof. We consider expression ([25)) in the extended case 5 : R — > R and we 
obtain a differentiable function for P > such that the original problem is 
just the restriction 5|n. We procede in the classical way looking for the critical 
points by differentiation. Since 



dP ^ ' aTc (P2 



C-P2 



BP + cy 



Then the only positive critical (in fact the maximum) point for the extended 
problem is P* — 



We define 

ale 



p* 



[P,] if Sn{[P.])-S{\P,] 
[P,] + 1 otherwise. 



1) < 



where [•] is the maximum integer function. Finally, for the case where iS([P,]) — 
iS([P,] + 1) we know that [P,] is optimal then P* is well defined. 

The second affirmation is obtained directly by substituting P* = \f^^ 

directly in (pgj) . 

□ 
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Once we know the optimal value for P* as a function of ^ , we are interested 
in the gain, the speed-up which we can reach with this strategy of parallelizing 
windows. We write the speed-up function as a function of the ratio ^ 
with parameters a, b and j as 

SoptiP*) = 7^-^ (30) 

a + 2^ + (6- 1 



From expression (j30[) we obtain another useful result 

Corollary 1 Let k(^£^t,P) be the number of iterations needed to convergence for 
the parallel algorithm where p) is as in \2b]] and the parameters a, b and j 
are fixed for some particular problem. The speed-up of the parallel algorithm is 
bounded by 

S{P) < -. (31) 
a 



Proof. Take the limit for expression (|30[) when P* — > oo. 

□ 



5 Additional discussion 

What we gain with this algorithm is the time computation of the perturbing 
terms. Since we have concatenated 100 SBAB4 schemes, we have called 500 
times the perturbing function in each parallel step (5 times in each SBABi). In 
the sequential correction we gain in time 500CAt?B where Tg is the time used in 
the computation of the perturbing term. However, we must call the integrable 
function 400P times in the correction step. It means that this algorithm works 
in problems where T4 <^ Tg like the spin-orbit problem. This is clear from the 

optimal value of the shifting window P* = ^ in expression and the 

fact that T-p = j{ATA + 5Tb) and Tc = AjTa + T^. We have 

Tc ijTA + Tb AjTa + Tb^ Ta 

Moreover, the time of one iteration for this algorithm using the SBABn 
scheme as underlying integrator is 

Tr + iP~C\t)Tc - ji^TA + 5TB) + iP-C^t)i^jTA + TB) 
= 4j iP~CAt + l)TA + (4j +P-CAt)TB 

where j and P must be selected such that the following conditions are fulfilled: 

1. T-p = iP — CAt)Tc, this is important to avoid, at maximum, the dead-time 
in the threads. 

2. T-p -\- {P — CAt)Tc be minimal, in order to obtain a large speed-up. 



16 



The reader must note that for large values of P or j the algorithm might not 
work if T4 ~ Tg are comparable. In those cases we have used an alternative 
corrector scheme to parallelize the La2010 solution (see [IODj such scheme re- 
duces the time Tc but increases the number of iterations needed to convergence 
(see [13). Finally, for higher dimensional Hamiltonian problems as the Solar 
system dynamics, we find that the value of j is very restricted. Some tests 
with the La2010 solutions [2^ accept only a maximum of 8 SABA4 schemes 
concatenated in a At interval, and for j > 8 the value of Ca* = 1 which is a 
worst case than the sequential solution. 
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